Author

Brandon Kim, Bena Smith, Amanda Belden

Introduction

Chicago is considered a great place to visit by many, with many tourist attractions and a big city culture, but it’s also known for having high crime rates. Our project aims to use Chicago crime data to determine whether certain neighborhoods of Chicago have associations with certain types of crimes being committed there. We plan to use various modeling and visualization techniques in order to better explore this topic, and provide Chicago locals, tourists, and police with a better understanding of the crime in their city.

Our three tasks will be:

  1. Using multiple non-supervised clustering methods of varying precision to determine if specific locations, based on latitude and longitude, and times are associated with the frequency of certain crimes. (Particularly about homicides, sexual assault and Officer Interference).
  2. Developing a gradient boosted decision tree model to make predictions on which type of crime may be reported given a location, time and other factors.
  3. Developing a gradient boosted decision tree model to make predictions on whether an arrest was made given the same factors.

Our goal with the first task is to help tourists and any unfamiliar or safety conscious people of Chicago to be better informed about the trends of different crimes. Our clustering analysis aims to be able to find specific times and locations in which certain crimes are significantly more common than other crimes. Determining the centroids of these clustered points and finding their range in coordinates, times and dates will help these people to be more wary of certain crimes depending on location.

Although beneficial, this type of analysis has a lot of biases and can be misleading to those that are less informed. For instance, it is important to realize that we are performing the clustering on the type of crimes, rather than the frequency. This means that we will be able to determine the most popular crimes given a place and time, rather than the odds of a certain crime happening. However, using the size of a cluster will help people determine how often crimes happen in a given area.

Our goal with the second task is to allow locals, citizens, and police to choose their own location and time, and we will predict which type of crime may occur there. This will allow locals to potentially choose places to live or routes to work, and allow tourists to make informed choices on where they should or should not visit. Additionally, it could allow police to send additional resources for that specific crime to the given location.

Our third task is to make a model to predict if an arrest was made based on information about the crime including type of crime, time of day, day of the week, and location. This will allow Chicago police to know how they might reorganize their efforts. This will also give future researchers insights into how arrests are made in Chicago.

Overall, the findings of this project are intended to benefit multiple groups of people. Tourists planning to visit Chicago could benefit from a better understanding of safe and potentially risky areas. Residents will have more information which will allow them to make better decisions on their living situation and daily activities. Finally, the Chicago police will have more insight into crime hotspots, enabling them to better distribute their resources as necessary.

The datasets used in this analysis are from the Chicago Data Portal, specifically the Crimes dataset (Chicago Data Portal) and the Homicide Map dataset (Chicago Data Portal) . These are compiled by the Chicago Police Department using their system called CLEAR - Citizen Law Enforcement Analysis and Reporting System, which is available to the public.

While this data includes all reported crimes published by the Chicago Police Department, there may be inherent bias involved. It is possible that racial profiling used by police departments may over-inflate the crimes reported for people of color. Additionally, those who live in lower income or gang related areas may also be more likely to be arrested for potentially committing a crime as compared to people from more affluent neighborhoods. It is also incredibly important to note that this data reflects the reported crimes in Chicago, not the true number of crimes. Any crimes that tend to be reported less often are unlikely to be adequately represented with this analysis. For example, domestic assault or sexual assault cases may be represented incorrectly, as these types of crimes tend to be reported less often. Thus, it is important that we include these biases in our conclusions and limit our generalizations, to ensure that the mentioned groups, as well as others, are not negatively affected by our conclusions. We recognize that this data regards sensitive information, and we will do everything possible to ensure that our results are not used in a way which may harm certain groups of people.

In conclusion, this project seeks to find patterns of crime throughout Chicago and contribute to the greater good by giving the people of Chicago access to easily understandable information about the safety of their surroundings.

Previous Work

(Schreck, McGloin, and Kirk 2009) studies 300 neighborhoods in Chicago over 2 years (1995-1996) and finds that these neighborhoods “clearly distinguish themselves based upon the types of crimes that occur there.” Some neighborhoods are more likely to have higher rates of violent or non violent crimes and these trends were often the same over years. We want to look at more recent years and larger timeline and see if these findings remain true. We would also like to focus on more specific types of crime instead of the more broad categories of violent and nonviolent.

Reasoning for the different distributions of different types of crime is proposed in “Street Gang Crime in Chicago” (Block and Block 1993) which finds that street gangs in Chicago have different areas that they are concentrated in and different crimes that they engage in. “Most of the criminal activity in smaller street gangs centered on representation turf defense. The most lethal street gang hot spot areas are along disputed boundaries between small street gangs…Street gangs specializing in instrumental violence were strongest in disrupted and declining neighborhoods. Street gangs specializing in expressive violence were strongest and most violent in relatively prosperous neighborhoods with expanding populations.”

This study was carried out in 1993 but “An analysis of police responses to gangs in Chicago” (Lemmer, Bensinger, and Lurigio 2008) finds that this hot-spot nature of street gangs has continued. This is interesting when finding the areas that violent crimes and non violent crimes occur in Chicago. There are likely underlying reasons why certain crimes occur in certain areas. When we perform cluster analysis on latitude and longitude, we want to compare this to an actual map of Chicago and potentially current information about street gangs.

Ba finds in “The Role of Officer Race and Gender in Police-Civilian Interactions in Chicago” that “Chicago is also heavily segregated, [and] has a history of racial tensions between residents and police” (Ba et al. 2021). This is potentially impactful to arrest rates as Smith writes in “Racial Profiling? A Multivariate Analysis of Police Traffic Stop Data” that in the US, “Historically, minorities, and particularly African Americans, have had physical force used against them or have been arrested or stopped by police at rates exceeding their percentage in the population.” (Smith and Petrocelli 2001) We would like to study the distribution of arrest rates in different locations of Chicago and regarding different types of Crime. When analyzing these rates and locations, we would like to reference maps including demographic information about the people living in these areas.

Different types of crimes may also occur at different times of the day. The US Department of Justice (“Violent Crime Time of Day(per 1,000 in Age Group)” 2022) finds that, “In general, the number of violent crimes committed by adults increases hourly from 6 a.m. through the afternoon and evening hours, peaks at 9 p.m., and then drops to a low point at 5 a.m. In contrast, violent crimes committed by youth peak in the afternoon between 3 p.m. and 4 p.m., the hour at the end of the school day. More than one-third (37%) of all violent crime committed by youth occurs in the 5 hour period between noon and 5 p.m. In comparison, 30% of all violent crime committed by adults occurs between 6 p.m. and 11 p.m.” We want to perform analysis on the time of day that certain types of crimes occur in Chicago and compare them to this data that is an aggregate of 45 states and DC.

(Jenness and Grattet 1996) studied crime in Denver, Colorado and Los Angeles, California. They created a model to predict types of crimes occurring at a certain location, time, and day of the week. They used a Decision Tree classifier and a Naïve Bayesian classifier and achieved 51% prediction accuracy in Denver and 54% prediction accuracy in Los Angeles for predicting the type of crime.

We would like to perform inference about crime trends regarding location type, location and time of day and we would also like to create a predictive model that attempts to predict crime type using time of day, location, and location type variables as predictors. We must be very careful about the use of this model because we should not assume the crime that someone predicted based only on these non-descriptive predictors. This model should be used only for personal interest and resource organization and should not be used for prosecution in any manner. We should be very careful about the ethical implications of police focus. We address these concerns further in the ethical implications section.

Data Cleaning

Code
crimes <- crimes %>%
  select(-c(`District`, `Community Area`)) %>%
  bind_rows(homicides) %>%
  drop_na() %>% 
  filter(`X Coordinate` != 0) %>%
  mutate(`Primary Type` = case_when(
    `Primary Type` == "CRIM SEXUAL ASSAULT" ~ "CRIMINAL SEXUAL ASSAULT", 
    `Primary Type` == "NON-CRIMINAL (SUBJECT SPECIFIED)" ~ "NON-CRIMINAL",
    `Primary Type` == "OTHER NARCOTIC VIOLATION" ~ "NARCOTICS",
    `Primary Type` == "SEX OFFENSE" ~ "CRIMINAL SEXUAL ASSAULT",
    `Primary Type` == "NON - CRIMINAL" ~ "NON-CRIMINAL",
    .default = `Primary Type`), 
    Date = mdy_hms(Date), 
    Month = as.factor(month(Date)),
    Hour = as.factor(hour(Date)), 
    Weekday = weekdays(Date),
    Ward = as.factor(Ward),
    Arrest = as.factor(Arrest), 
    `Location Description` = case_when(
      grepl("airport", `Location Description`, ignore.case = T) ~ "Airport/Aircraft",
      grepl("aircraft", `Location Description`, ignore.case = T) ~ "Airport/Aircraft",
      grepl("Tavern", `Location Description`, ignore.case = T) ~ "Tavern", 
      grepl("CHA ", `Location Description`, ignore.case = T) ~ "CHA",
      grepl("College", `Location Description`, ignore.case = T) ~ "College", 
      grepl("CTA", `Location Description`, ignore.case = T) ~ "CTA",
      grepl("RESIDENCE", `Location Description`, ignore.case = T) ~ "Residence",
      grepl("SCHOOL", `Location Description`, ignore.case = T) ~ "School",
      `Location Description` == "VEHICLE - OTHER RIDE SERVICE" ~ "VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER, ETC.)", 
      `Location Description` == "VEHICLE - OTHER RIDE SHARE SERVICE (E.G., UBER, LYFT)" ~ "VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER, ETC.)",
      `Location Description` == "PARKING LOT/GARAGE(NON.RESID.)" ~ "PARKING LOT/GARAGE (NON RESIDENTIAL)", 
      `Location Description` == "POLICE FACILITY/VEH PARKING LOT" ~ "POLICE FACILITY/VEHICLE PARKING LOT",
      `Location Description` == "NURSING HOME/RETIREMENT HOME" ~ "NURSING/RETIREMENT HOME",
      `Location Description` == "VEHICLE-COMMERCIAL: TROLLEY BUS" ~ "VEHICLE-COMMERCIAL",
      .default = `Location Description`),
      `Location Description` = gsub(" / ", "/", `Location Description`), 
      `Location Description` = gsub(" - ", "-", `Location Description`),
      `Location Description` = toupper(`Location Description`))

We wanted the data sets to have equal representation in the features, so we got rid of district and community area. We also decided to drop all na values as there were less than 1% of observations that had an NA value in a feature we cared about.

Additionally, in the big data set, the crime type has multiple representations of a type of crime used, so we used a case_when to merge the like ones together.

Code
homicides <- homicides %>%
  janitor::clean_names() %>%
  drop_na() %>%
  filter(x_coordinate != 0) %>%
  mutate(date = as.POSIXct(date, format = "%m/%d/%Y %I:%M:%S %p"),
         year = year(date))

We would also like to clean the homicides data set for some surface level analysis as well.

Exploratory Analysis

We need to perform some surface level analysis before fitting clustering, to gauge the effectiveness of geospatial cluster analysis.

We can create a map view of our collected records of incidents and informally examine (eyeball) if any clusters are visibly recognizable just from their coordinates. In order to mitigate visual clutter in the map visualization, we can create multiple map visualizations off of all permutations of some of our interested features (i.e. day, month, year, type of crime). This lets us examine the variability that each cluster has in size, range and location, in which the cluster analysis should hopefully be able to segment with a predetermined number of clusters. Specifically, we visualized this using an r shiny app where one can select specific features and view the clusters of incidents.

For simplicity’s sake (and since shinyapps.io doesn’t allow data that exceeds 100 mb), we will be using data that only relates to homicides, sexual assault and officer interference, as those are the crimes we particularly care about. In addition, we will also be including assault, robbery, motor vehicle theft and prostitution, as those are some general crimes that people are especially weary of in Chicago.

Leaflet Shiny App:

Code
include_app("https://b7iuz3-brandon-kim.shinyapps.io/ChicagoCrimeSubset/", height = "900px")

Looking at multiple clusters across different inputs, we can see that the clusters do vary in size, shape and range.

Additionally, creating some bar plots to see the distribution of the number of incidents per different categories could definitely tell us about some of the trends we could potentially investigate.

Bar Plot - Incidents by Crime Type:

Code
data.frame(table(crimes$`Primary Type`), stringsAsFactors = FALSE) %>%
  mutate(Crime = factor(Var1, levels = unique(Var1)[order(Freq)])) %>%
  plot_ly(x = ~Freq, y = ~Crime, type = "bar") %>%
  layout(title = "Frequency of Incidents in Chicago by Type of Crime",
         xaxis = list(title = ""), 
         yaxis = list(title = ""),
         annotations = list(
                        list(
                          x = 0.5,
                          y = -0.1,  
                          xref = "paper",
                          yref = "paper",
                          text = "Record by City of Chicago between January 1st, 2001 and November 7th, 2023", 
                          showarrow = FALSE,
                          yanchor = "bottom"  
                        )
                      )
  ) 

Bar Plot - Incidents by Year:

Code
crimes %>% 
  mutate(Year = as.factor(year(Date))) %>% 
  group_by(Year) %>%
  summarize(Freq = n()) %>% 
  plot_ly(x = ~Freq, y = ~Year, type = "bar") %>%
  layout(title = "Frequency of Incidents in Chicago by Year", 
         xaxis = list(title = ""),
         yaxis = list(title = ""),
         annotations = list(
                        list(
                          x = 0.5,
                          y = -0.1,  
                          xref = "paper",
                          yref = "paper",
                          text = "Recorded instances are those identified by the City of Chicago as the types listed in the plot above", 
                          showarrow = FALSE,
                          yanchor = "bottom"  
                        )
                      )
         )

Task 1: General Cluster Analyses

Geospatial Cluster Analysis - General Crime Pattern Discernment

We want to see which types of crime are more popular in different areas of Chicago. To do that, we performed K means clustering on crime observations based on their X and Y coordinates. We chose to look at Arson, Liquor Law Violation, Gambling, Kidnapping, Concealed Carry License Violations, Criminal Trespassing, Narcotics, Burglary, and Interference with a Public Officer. We chose a subset of crimes because there were many listed and our analysis may be clouded by the vast array of crime types.

Code
subsetcrimes <- crimes %>%
  filter(`Primary Type` %in% c("ARSON", "LIQUOR LAW VIOLATION", "GAMBLING", "KIDNAPPING", "CONCEALED CARRY LICENSE VIOLATION", "CRIMINAL TRESPASS", "NARCOTICS", "BURGLARY", "INTERFERENCE WITH PUBLIC OFFICER", "CRIMINAL SEXUAL ASSAULT"))

For visualization purposes, since the subset is 1328825 observations (which is a lot for visualization), we will be using an animation to section it off by year. When we performed kmeans clustering, we used the initial 2002 centroids from this larger dataset with many crimes as the initial centroids for other years and plots. This allows the clusters to be more comparable over years and plots throughout this project. This plot with many crimes is still very hard to pull analysis from because there are so many observations so we also created an interactive table and graphic of these clusters and the proportions of types of crimes within each cluster each year. We will still show the plot below so the reader can visualize these clusters. We preferred frequency analysis of type of crime types in clusters over counts of crime types because these clusters are arbitrary and not based on population sizes or area.

Code
years <- c(2002:2023)

new_colnames <- append(colnames(subsetcrimes), "cluster")
new_crimes_w_clusters <- data.frame(matrix(ncol=24, nrow = 0))  # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(new_crimes_w_clusters) <- new_colnames

get_clusters <- function(selected_year, prev_centroids=NULL){
  
  data_by_year <- subset(subsetcrimes, Year == selected_year)
  
  data_by_year_red <- select(data_by_year, `X Coordinate`, `Y Coordinate`)
  
  if (selected_year == 2002) {
    km <- kmeans(data_by_year_red, centers=5)
  } 
  else {
    km <- kmeans(data_by_year_red, centers=prev_centroids) ##line from chatgpt

  }
  
  data_by_year$cluster <-  km$cluster

  return(list(yr_df_w_centers = data_by_year, centroids=km$centers))

}


get_clusters_helper <- function(yr){
  
  if(yr == 2002){
    prev_centroids = NULL
    retval <- get_clusters(yr, prev_centroids)
    first_centroids <<- retval$centroids
  }
  
  else{
    retval <- get_clusters(yr, first_centroids)
  }

  
  new_crimes_w_clusters <<- rbind(new_crimes_w_clusters, retval$yr_df_w_centers)
## global variables bc of map function - can change to a for loop where i return retval and add in the loop lmk if u think of a better way to do this
  
}

prev_centroids <- NULL
map(years, get_clusters_helper)
Code
animatedplot <- ggplot(new_crimes_w_clusters, aes(x = `X Coordinate`, y = `Y Coordinate`, group = interaction(Year, `X Coordinate`), color=as.factor(cluster))) +
  geom_point(alpha=0.15) +
  transition_time(Year) +
    scale_color_manual(name = "Cluster", values = c("lightblue", "orange", "green", "pink","plum"))+
  labs(color = "Cluster", 
       subtitle = "Year: {str_sub(frame_time, 1, 4)}", 
       title = "Many Crimes on X and Y Coordinates by Year with K Means Clusters", x = "X Coordinate", y = "Y Coordinate") 


#Jon Spring on Stack Overflow https://stackoverflow.com/questions/56411604/how-to-make-dots-in-gganimate-appear-and-not-transition
#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 
Code
animate(animatedplot,  height = 500, width = 800,
        duration = 20, end_pause = 10, res = 100)

Interestingly, it looks like the number of crimes overall is decreasing over time or these crimes are being less reported.

The following table shows the frequency of each crime in each cluster for each year. Each row adds up to 1. This will allow us to see which types of crimes are more popular in which areas and how these trends change over time.

Code
## chat gpt and tb on Stack Overflow https://stackoverflow.com/questions/22767893/count-number-of-rows-by-group-using-dplyr

df_proportions <- new_crimes_w_clusters %>% 
  group_by(Year, cluster, `Primary Type`) %>%
  summarise(Count = n()) %>% 
  group_by(Year, cluster) %>%
  mutate(TotalCount = sum(Count)) %>%
  mutate(Proportion = Count / TotalCount) %>%
  select(Year, cluster, `Primary Type`, Proportion)


df_proportions$cluster <- as.character(df_proportions$cluster)

## one row for year, cluster combination
df_proportions %>%
  pivot_wider(names_from = `Primary Type`, values_from = Proportion, values_fill = 0) %>%
  datatable(options = list(pageLength = 5))

The following plot visualizes these proportions of crime types in each cluster

Code
include_app("https://3efmzi-bena-smith.shinyapps.io/ShinyAppCrimeProportions/", height = "800px")

Some observations from our table and visual:

  • The proportion of crimes with the type interference with a public officer increases over time in all clusters

  • The proportion of crimes with the type narcotics decreases over time in all clusters

  • Cluster 4 has a higher frequency of crimes with the type interference with a public officer than other clusters in recent years than other clusters

  • In most years, Cluster 1 looks to have a higher frequency of crimes with the type, Burglary than other clusters

  • The frequency of crimes with the type gambling has decreased over time overall but used to have the highest frequencies in cluster 2 and 3.

Geospatial Cluster Analysis - Homicides Pattern Discernment

We wanted to look more in depth at homicide locations in Chicago. We made made an animated plot of the locations of homicides in Chicago based on x/y coordinates. We clustered this data using K Means with 5 clusters. We also looked at the arrest rates in each cluster. Although a homicide does not necessarily mean that someone is liable for a crime and can be arrested, we used arrests as a metric representing the effectiveness of police in the area. The arrest rate does not need to be close to 1 for us to deem the police as effective because a homicide does not necessarily mean that someone is liable for a crime. However, if the arrest rate is lower in one cluster compared to other clusters, this may be evidence that police need to focus more on these areas.

Code
## do k means on homicide data by year

set.seed(1)
years <- c(2001:2023)

new_colnames <- append(colnames(homicides), "cluster")
new_homicides_w_clusters <- data.frame(matrix(ncol=21, nrow = 0)) # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(new_homicides_w_clusters) <- new_colnames

get_clusters <- function(selected_year, prev_centroids=NULL){
  data_by_year <- subset(homicides, year == selected_year)
  
  data_by_year_red <- select(data_by_year, x_coordinate, y_coordinate)
  km <- kmeans(data_by_year_red, centers=first_centroids) ##line from chatgpt
  
  data_by_year$cluster <-  km$cluster

  return(list(yr_df_w_centers = data_by_year, centroids=km$centers))

}


get_clusters_helper <- function(yr){
  if(yr == 2001){
    retval <- get_clusters(yr,NULL)
    
  }
  else{

    retval <- get_clusters(yr, prev_centroids)
  }
  
  new_homicides_w_clusters <<- rbind(new_homicides_w_clusters, retval$yr_df_w_centers)
  prev_centroids <<- retval$centroids ## global variables bc of map function - can change to a for loop where i return retval and add in the loop lmk if u think of a better way to do this
  
}

prev_centroids <- NULL
map(years, get_clusters_helper)
Code
## get arrest rates for every year/ cluster combo 

arrest_rates <- data.frame(matrix(ncol=3, nrow = 0)) # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(arrest_rates) <- c("year, cluster, arrest_rate")


years <- c(2001:2023)
clusters <- c(1:5)

  
get_arrest_rate <- function(combo){
  sel_year <- combo[1]
  sel_cluster <- combo[2]
  
  data_by_year <- subset(new_homicides_w_clusters, year == sel_year)
  data_by_year_and_cluster <- subset(data_by_year, cluster == sel_cluster)
  
  num_arrests <- sum(data_by_year_and_cluster$arrest == "TRUE")
  total_rows <- nrow(data_by_year_and_cluster)
  
  arrest_rate <- num_arrests/total_rows
  
  arrest_rates <<- rbind(arrest_rates,  list(year = sel_year, cluster=sel_cluster, arrest_rate=arrest_rate))

  
}

combos <- expand.grid(years, clusters) # Onyejiaku Theophilus Chidalu from educative https://www.educative.io/answers/what-is-the-expandgrid-function-in-r

apply(combos, 1,get_arrest_rate)
Code
## add column for mean x/ y coordinate as the coordinate to display the arrest rate 

merged_df <- merge(new_homicides_w_clusters, arrest_rates, by = c("year", "cluster"))

merged_df <- merged_df %>%
  group_by(year, cluster) %>%
  mutate(mean_x_coordinate = mean(x_coordinate))%>%
  mutate(mean_y_coordinate = mean(y_coordinate))%>% 
  ungroup()
Code
graph1 <- merged_df %>% ggplot()+
  xlim(1110000, 1225000)+
  geom_point(alpha=0.15, aes(x = x_coordinate, y = y_coordinate, group = interaction(year, x_coordinate), color=as.factor(cluster)))+
  scale_color_manual(name = "Cluster", values = c("lightblue", "orange", "green", "pink","plum")) +
  new_scale_color()+
    geom_text(aes(x=mean_x_coordinate,y=mean_y_coordinate, label = paste("Arrest Rate: ", round(arrest_rate, 2) ), color = ifelse(arrest_rate > 0.45, "Above 0.45", "Below or equal to 0.45"), group = interaction(year, mean_x_coordinate))) +
    scale_color_manual(name = "Arrest Rate", values = c("purple", "hotpink"))

#Jon Spring on Stack Overflow https://stackoverflow.com/questions/56411604/how-to-make-dots-in-gganimate-appear-and-not-transition
Code
graph1.animation <- graph1 +
  theme_minimal()+
  transition_time(year) +
  labs(subtitle = "Year: {str_sub(frame_time, 1, 4)}", color ="Arrest Rate", title="Homicides in Chicago by Year with KMeans Clusters",  x = "X Coordinate", y = "Y Coordinate")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 
#dc37 on Stack Overflow https://stackoverflow.com/questions/11838278/plot-with-conditional-colors-based-on-values-in-r 
Code
animate(graph1.animation, height = 500, width = 800, fps = 30, duration = 30,
        end_pause = 60, res = 100)

Code
anim_save("homicide_cluster_analysis.gif")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 

Arrest rates colored in purple are above 0.45 for that cluster. Arrest rates below or equal to 0.45 are colored in pink for that cluster. One observation is that as time goes on, more clusters have lower arrest rates. This may be because crime is increasing or may be due to changes in police organization. We would like to look at numbers of homicides and see if they increase over time. It is also interesting that cluster 2 has a higher arrest rates than the other clusters most of the time and especially in later years. It will be interesting to do more research about this area. We can look at demographic information including earnings data in this area. We would like to investigate other crimes individually to see if the spacial distribution of crimes differs by crime type.

Geospatial Cluster Analysis - Criminal Sexual Assault Pattern Discernment

Code
sexassault_subsetcrimes <- crimes %>%
  filter(`Primary Type` %in% c("CRIMINAL SEXUAL ASSAULT"))
Code
years <- c(2002:2023)

new_colnames <- append(colnames(sexassault_subsetcrimes), "cluster")
new_sexassault_w_clusters <- data.frame(matrix(ncol=24, nrow = 0))  # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(new_sexassault_w_clusters) <- new_colnames

get_clusters <- function(selected_year, prev_centroids=NULL){
  
  data_by_year <- subset(sexassault_subsetcrimes, Year == selected_year)
  
  data_by_year_red <- select(data_by_year, `X Coordinate`, `Y Coordinate`)
  km <- kmeans(data_by_year_red, centers=first_centroids) ##line from chatgpt

  
  data_by_year$cluster <-  km$cluster

  return(list(yr_df_w_centers = data_by_year, centroids=km$centers))

}


get_clusters_helper <- function(yr){

  retval <- get_clusters(yr, prev_centroids)
  
  new_sexassault_w_clusters <<- rbind(new_sexassault_w_clusters, retval$yr_df_w_centers)
  prev_centroids <<- retval$centroids  ## global variables bc of map function - can change to a for loop where i return retval and add in the loop lmk if u think of a better way to do this
  
}

prev_centroids <- NULL
map(years, get_clusters_helper)
Code
## get arrest rates for every year/ cluster combo 

arrest_rates <- data.frame(matrix(ncol=3, nrow = 0)) # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(arrest_rates) <- c("year, cluster, arrest_rate")


years <- c(2002:2023)
clusters <- c(1:5)

  
get_arrest_rate <- function(combo){
  sel_year <- combo[1]
  sel_cluster <- combo[2]
  
  data_by_year <- subset(new_sexassault_w_clusters, Year == sel_year)
  data_by_year_and_cluster <- subset(data_by_year, cluster == sel_cluster)
  
  num_arrests <- sum(data_by_year_and_cluster$Arrest == "TRUE")
  total_rows <- nrow(data_by_year_and_cluster)
  
  arrest_rate <- num_arrests/total_rows
  
  arrest_rates <<- rbind(arrest_rates,  list(Year = sel_year, cluster=sel_cluster, arrest_rate=arrest_rate))

  
}

combos <- expand.grid(years, clusters) # Onyejiaku Theophilus Chidalu from educative https://www.educative.io/answers/what-is-the-expandgrid-function-in-r

apply(combos, 1,get_arrest_rate)
Code
## add column for mean x/ y coordinate as the coordinate to display the arrest rate 

merged_df <- merge(new_sexassault_w_clusters, arrest_rates, by = c("Year", "cluster"))

merged_df <- merged_df %>%
  group_by(Year, cluster) %>%
  mutate(mean_x_coordinate = mean(`X Coordinate`))%>%
  mutate(mean_y_coordinate = mean(`Y Coordinate`))%>% 
  ungroup()
Code
graph1 <- merged_df %>% ggplot()+
  xlim(1110000, 1225000)+
  geom_point(alpha=0.15, aes(x = `X Coordinate`, y = `Y Coordinate`, group = interaction(Year, `X Coordinate`), color=as.factor(cluster)))+
  scale_color_manual(name = "Cluster", values = c("lightblue", "orange", "green", "pink","plum")) +
  new_scale_color()+
    geom_text(aes(x=mean_x_coordinate,y=mean_y_coordinate, label = paste("Arrest Rate: ", round(arrest_rate, 2) ), color = ifelse(arrest_rate > 0.2, "Above 0.2", "Below or equal to 0.2"), group = interaction(Year, mean_x_coordinate))) +
    scale_color_manual(name = "Arrest Rate", values = c("purple", "hotpink"))

#Jon Spring on Stack Overflow https://stackoverflow.com/questions/56411604/how-to-make-dots-in-gganimate-appear-and-not-transition
Code
graph1.animation <- graph1 +
  theme_minimal()+
  transition_time(Year) +
  labs(subtitle = "Year: {str_sub(frame_time, 1, 4)}", color ="Arrest Rate", title="Criminal Sexual Assaults in Chicago by Year with KMeans Clusters",  x = "X Coordinate", y = "Y Coordinate")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 
#dc37 on Stack Overflow https://stackoverflow.com/questions/11838278/plot-with-conditional-colors-based-on-values-in-r 
Code
animate(graph1.animation, height = 500, width = 800, fps = 30, duration = 30,
        end_pause = 60, res = 100)

Code
anim_save("sexassault_cluster_analysis.gif")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 

Arrest rates colored in purple are above 0.2 for that cluster. Arrest rates below or equal to 0.2 are colored in pink for that cluster. One observation is that as time goes on, more clusters have lower arrest rates. In earlier years, the northern most cluster has the highest arrest rates but over time, all clusters have a similar low arrest rate from 0.1-0.05 in 2022 and 2023.

Geospatial Cluster Analysis - Public Officer Interference Pattern Discernment

Code
interferencewPO_subsetcrimes <- crimes %>%
  filter(`Primary Type` %in% c("INTERFERENCE WITH PUBLIC OFFICER"))
Code
years <- c(2002:2023)

new_colnames <- append(colnames(interferencewPO_subsetcrimes), "cluster")
new_interferencewPO_w_clusters <- data.frame(matrix(ncol=24, nrow = 0))  # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(new_interferencewPO_w_clusters) <- new_colnames

get_clusters <- function(selected_year, prev_centroids=NULL){
  
  data_by_year <- subset(interferencewPO_subsetcrimes, Year == selected_year)
  
  data_by_year_red <- select(data_by_year, `X Coordinate`, `Y Coordinate`)
  km <- kmeans(data_by_year_red, centers=first_centroids) ##line from chatgpt

  
  data_by_year$cluster <-  km$cluster

  return(list(yr_df_w_centers = data_by_year, centroids=km$centers))

}


get_clusters_helper <- function(yr){

  retval <- get_clusters(yr, prev_centroids)
  
  new_interferencewPO_w_clusters <<- rbind(new_interferencewPO_w_clusters, retval$yr_df_w_centers)
  ## global variables bc of map function - can change to a for loop where i return retval and add in the loop lmk if u think of a better way to do this
  
}

prev_centroids <- NULL
map(years, get_clusters_helper)
Code
## get arrest rates for every year/ cluster combo 

arrest_rates <- data.frame(matrix(ncol=3, nrow = 0)) # Thomas on stackoverflow https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r
colnames(arrest_rates) <- c("year, cluster, arrest_rate")


years <- c(2002:2023)
clusters <- c(1:5)

  
get_arrest_rate <- function(combo){
  sel_year <- combo[1]
  sel_cluster <- combo[2]
  
  data_by_year <- subset(new_interferencewPO_w_clusters, Year == sel_year)
  data_by_year_and_cluster <- subset(data_by_year, cluster == sel_cluster)
  
  num_arrests <- sum(data_by_year_and_cluster$Arrest == "TRUE")
  total_rows <- nrow(data_by_year_and_cluster)
  
  arrest_rate <- num_arrests/total_rows
  
  arrest_rates <<- rbind(arrest_rates,  list(Year = sel_year, cluster=sel_cluster, arrest_rate=arrest_rate))

  
}

combos <- expand.grid(years, clusters) # Onyejiaku Theophilus Chidalu from educative https://www.educative.io/answers/what-is-the-expandgrid-function-in-r

apply(combos, 1,get_arrest_rate)
Code
## add column for mean x/ y coordinate as the coordinate to display the arrest rate 

merged_df <- merge(new_interferencewPO_w_clusters, arrest_rates, by = c("Year", "cluster"))


merged_df <- merged_df %>%
  group_by(Year, cluster) %>%
  mutate(mean_x_coordinate = mean(`X Coordinate`))%>%
  mutate(mean_y_coordinate = mean(`Y Coordinate`))%>% 
  ungroup()
Code
graph1 <- merged_df %>% ggplot()+
  xlim(1110000, 1225000)+
  geom_point(alpha=0.15, aes(x = `X Coordinate`, y = `Y Coordinate`, group = interaction(Year, `X Coordinate`), color=as.factor(cluster)))+
  scale_color_manual(name = "Cluster", values = c("lightblue", "orange", "green", "pink","plum")) +
  new_scale_color()+
    geom_text(aes(x=mean_x_coordinate,y=mean_y_coordinate, label = paste("Arrest Rate: ", round(arrest_rate, 2) ), color = ifelse(arrest_rate > 0.9, "Above 0.9", "Below or equal to 0.9"), group = interaction(Year, mean_x_coordinate))) +
    scale_color_manual(name = "Arrest Rate", values = c("purple", "hotpink"))

#Jon Spring on Stack Overflow https://stackoverflow.com/questions/56411604/how-to-make-dots-in-gganimate-appear-and-not-transition
Code
graph1.animation <- graph1 +
  theme_minimal()+
  transition_time(Year) +
  labs(subtitle = "Year: {str_sub(frame_time, 1, 4)}", color ="Arrest Rate", title="Interference with Public Officer in Chicago by Year with KMeans Clusters",  x = "X Coordinate", y = "Y Coordinate")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 
#dc37 on Stack Overflow https://stackoverflow.com/questions/11838278/plot-with-conditional-colors-based-on-values-in-r 
Code
animate(graph1.animation, height = 500, width = 800, fps = 30, duration = 30,
        end_pause = 60, res = 100)

Code
anim_save("sexassault_cluster_analysis.gif")

#finnstats on R bloggers https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/ 

Arrest rates colored in purple are above 0.9 for that cluster. Arrest rates below or equal to 0.9 are colored in pink for that cluster. One observation is that as time goes on, more clusters have higher arrest rates. It also looks like the northern most cluster has the lowest arrest rates for the most part. Our hypothesis is that this may have to do with demographics in this area. police may not arrest certain groups of people for police interference potentially more affluent groups of people. We would like to look at earning and demographic distributions in Chicago to investigate this.

Multi-Factor Cluster Analysis - General Crime Pattern Discernment

We would like to see if we can find clusters that are more pure in terms of type of crime.

To unpack and analyze the variations in crimes frequencies when it comes to factors beyond coordinates, a different clustering algorithm other than k-means must be used to accommodate for categorical data. The clustering algorithm we have decided upon is k-Prototypes Clustering, a cluster analysis method that combines the methods of both k-means for numerical data and k-modes for categorical data. The utilization of this type of analytical approach will be used to determine if certain crimes can be be grouped into specific groups, drawing inferences on the general timing and setting of specific crimes.

The issue with our current subset of data is that the frequencies of the crimes can hinder the inferential power of the algorithm. Since k-prototypes clusters observations around centroids, if a certain crime appears significantly more often with relatively moderate variability, a lot of the clusters can have a high influx of observations of that crime. To combat this, we will be using a sample of that subset in which each crime is represented equally within the data.

Crime frequencies:

Code
table(subsetcrimes$`Primary Type`) %>%
  kable %>% 
  kable_styling("striped")
Var1 Freq
ARSON 12160
BURGLARY 394447
CONCEALED CARRY LICENSE VIOLATION 1183
CRIMINAL SEXUAL ASSAULT 58093
CRIMINAL TRESPASS 198558
GAMBLING 13417
INTERFERENCE WITH PUBLIC OFFICER 18131
KIDNAPPING 6103
LIQUOR LAW VIOLATION 12883
NARCOTICS 671943
Code
proto_data <- subsetcrimes %>%
  group_by(`Primary Type`) %>%
  sample_n(1000) %>% 
  ungroup() %>% 
  select(`Primary Type`, `X Coordinate`, `Y Coordinate`, `Location Description`, Beat, Ward, Month, Hour, Weekday)
Code
proto_analysis <- kproto(select(proto_data, -`Primary Type`), 9)
Code
proto_data$clusters <- proto_analysis$cluster

proto_data %>%
  count(clusters, `Primary Type`) %>%
  spread(key = `Primary Type`, value = n) %>%
  mutate(across(-1, ~./sum(.))) %>%
  rowwise() %>%
  mutate(HighestColumn = names(.)[-1][which.max(c_across(-1))]) %>%
  cbind(proto_analysis$centers) %>%
  kable %>% 
  kable_styling("striped")
clusters ARSON BURGLARY CONCEALED CARRY LICENSE VIOLATION CRIMINAL SEXUAL ASSAULT CRIMINAL TRESPASS GAMBLING INTERFERENCE WITH PUBLIC OFFICER KIDNAPPING LIQUOR LAW VIOLATION NARCOTICS HighestColumn X Coordinate Y Coordinate Location Description Beat Ward Month Hour Weekday
1 0.176 0.133 0.057 0.112 0.082 0.114 0.135 0.118 0.120 0.120 ARSON 1145299 1911907 RESIDENCE 1912 37 12 23 Sunday
2 0.105 0.122 0.095 0.136 0.255 0.116 0.124 0.084 0.236 0.108 CRIMINAL TRESPASS 1165149 1902493 SIDEWALK 1421 27 9 17 Saturday
3 0.107 0.113 0.074 0.118 0.140 0.119 0.092 0.122 0.055 0.119 CRIMINAL TRESPASS 1174857 1868505 SCHOOL 0314 20 11 15 Wednesday
4 0.177 0.165 0.255 0.146 0.104 0.164 0.166 0.160 0.126 0.128 CONCEALED CARRY LICENSE VIOLATION 1160445 1863995 RESIDENCE 0832 17 8 19 Sunday
5 0.079 0.139 0.046 0.155 0.129 0.051 0.078 0.105 0.223 0.067 LIQUOR LAW VIOLATION 1160133 1932023 STREET 2022 49 4 21 Saturday
6 0.016 0.009 0.210 0.013 0.022 0.001 0.008 0.027 0.014 0.007 CONCEALED CARRY LICENSE VIOLATION 1109114 1933886 OTHER 1414 41 5 7 Saturday
7 0.126 0.084 0.085 0.110 0.086 0.221 0.164 0.090 0.122 0.253 NARCOTICS 1149898 1898275 CTA 1115 28 11 11 Thursday
8 0.102 0.130 0.098 0.108 0.122 0.117 0.114 0.140 0.060 0.107 KIDNAPPING 1182919 1853812 ALLEY 0322 6 7 10 Saturday
9 0.112 0.105 0.080 0.102 0.060 0.097 0.119 0.154 0.044 0.091 KIDNAPPING 1178621 1836070 RESIDENCE 0424 34 5 19 Friday

We would like to see if we can find clusters that are more pure in terms of type of crime. Our current run shows a big plurality within each cluster. The above table shows the centroids of these clusters.

Tasks 2 and 3: Predictive Modeling for Crime Type and Arrests

Task 2: Predicting Crime Types Using XGBoosted Trees

Code
subsetted_crimes <- crimes %>%
  sample_n(100000) 

subsetted_crimes <- na.omit(subsetted_crimes)

subsetted_crimes$`Primary Type` <- as.factor(subsetted_crimes$`Primary Type`)

splits <- subsetted_crimes %>% 
  initial_split(0.9, strata = `Primary Type`)

training_data <- splits %>% training()
testing_data <- splits %>% testing()
Code
training_data$`Primary Type` <- as.factor(training_data$`Primary Type`)

crimes_rec <- recipe(`Primary Type` ~ `Location Description` + Beat + 
                     Ward + `X Coordinate` + `Y Coordinate` + Year + Month + 
                     Hour + Weekday + Arrest
                     , data = training_data) %>%
  step_dummy(all_nominal_predictors()) 


xtrees <- boost_tree() %>%
  set_mode("classification") %>%
  set_engine("xgboost") 

gbdt <- workflow() %>%
  add_model(xtrees) %>%
  add_recipe(crimes_rec) %>%
  fit(training_data)
Code
testing_data <- testing_data %>%
  mutate(pred = predict(gbdt, new_data = testing_data)$.pred_class)
Code
training_data$`Primary Type` <- as.factor(training_data$`Primary Type`)

prec <- testing_data %>% precision(truth = `Primary Type`, estimate = pred) 
sens <- testing_data %>% sensitivity(truth = `Primary Type`, estimate = pred)
spec <- testing_data %>% specificity(truth = `Primary Type`, estimate = pred)
acry <- testing_data %>% accuracy(truth = `Primary Type`, estimate = pred) 

rf_class_metrics <- bind_rows(prec, sens, spec, acry)
rf_class_metrics %>%
  kable %>%
  kable_styling("striped")
.metric .estimator .estimate
precision macro 0.4472946
sensitivity macro 0.1360080
specificity macro 0.9738990
accuracy multiclass 0.3735626

Task 3: Predicting Arrest Rates Using XGBoosted Trees

We are only using a training data of 100000 for simplicity’s sake and runtime. For a full model, we can scale it to the full dataset using a cloud service.

Code
subsetted_crimes <- crimes %>%
  sample_n(100000) 

subsetted_crimes <- na.omit(subsetted_crimes)

subsetted_crimes$`Primary Type` <- as.factor(subsetted_crimes$`Primary Type`)

splits <- subsetted_crimes %>% 
  initial_split(0.9, strata = `Primary Type`)

training_data <- splits %>% training()
testing_data <- splits %>% testing()
Code
crimes_rec <- recipe(Arrest ~ `Primary Type` + `Location Description` + Beat + 
                     Ward + `X Coordinate` + `Y Coordinate` + Year + Month + 
                     Hour + Weekday, data = training_data) %>%
  step_dummy(all_nominal_predictors())

xtrees <- boost_tree() %>%
  set_mode("classification") %>%
  set_engine("xgboost", objective = "reg:squarederror") 

gbdt <- workflow() %>%
  add_model(xtrees) %>%
  add_recipe(crimes_rec) %>%
  fit(training_data)
Code
testing_data <- testing_data %>%
  mutate(pred = predict(gbdt, new_data = testing_data)$.pred_class)
Code
prec <- testing_data %>% precision(truth = Arrest, estimate = pred) 
sens <- testing_data %>% sensitivity(truth = Arrest, estimate = pred)
spec <- testing_data %>% specificity(truth = Arrest, estimate = pred)
acry <- testing_data %>% accuracy(truth = Arrest, estimate = pred) 

rf_class_metrics <- bind_rows(prec, sens, spec, acry)
rf_class_metrics %>%
  kable %>%
  kable_styling("striped")
.metric .estimator .estimate
precision binary 0.8758233
sensitivity binary 0.9793787
specificity binary 0.5906040
accuracy binary 0.8809119

As shown in our gradient boosted model, we have fairly high accuracy in predicting whether or not an arrest will be made in these Chicago crimes based on only a few predictors. This analysis can benefit police departments by helping inform them of which crimes may need extra resources or processes devoted to finding the offenders. Additionally, this analysis can keep the citizens of Chicago informed about which crimes tend to go unresolved, indicating they may need to be wary of repeat offenders of those crimes.

Discussion

Cluster Analysis

Through our cluster analysis we found that different areas have different frequencies of crime types. From our initial cluster analysis over many crime types, we can view our table of frequencies of crimes in each of our clusters over time and see that:

  • The proportion of crimes with the type interference with a public officer increases over time in all clusters

  • Cluster 4 looks to have a higher frequency of crimes with the type interference with a public officer than other clusters in recent years than other clusters

  • In most years, Cluster 1 looks to have a higher frequency of crimes with the type, Burglary than other clusters

  • The frequency of crimes with the type gambling has decreased over time overall but used to have the highest frequencies in cluster 2 and 3.

This analysis is from just looking at this chart but (TO DO) we want to make this into graph form so it is more visually appealing and interpretable and we want to get some actual statistics (means, standard deviations) from this chart.

We also find that arrest rates differ between different crimes and different areas. Arrest rates for homicides decrease over time for all clusters. With the northern most cluster having the highest arrest rate especially in recent years.

Criminal Sexual Assaults also have decreasing arrest rates over time for all clusters. In earlier years, the northern most cluster has the highest arrest rates but over time, all clusters have a similar low arrest rate from 0.1-0.05 in 2022 and 2023.

Interference with a Public Officer has an opposite trend with arrest rates increasing over time. It also looks like the northern most cluster has the lowest arrest rates for the most part. Our hypothesis is that this may have to do with demographics in this area. police may not arrest certain groups of people for police interference potentially more affluent groups of people.

Predicting Crime Type

Our boosted model is able to correctly predict 0.3792947 of crime types We do not do a good job of predicting if a certain crime was committed (0.1501741 sensitivity). Our model does do a good of a job predicting if someone did not commit a certain crime (0.9741298 specificity). Out of all observations predicted to be a certain crime type, 36.33% of them were actually were that crime type (0.3632954 precision).

Predicting Arrests

Our boosted model is able to correctly predict 0.8743000 of arrests. We do a good job of predicting if someone was arrested correctly (0.9769044 sensitivity). Our model does not do as good of a job predicting if someone was not arrested (0.5816641 specificity). Out of all positive predictions, 0.8694555 of them were actually positive (0.8694555 precision). Police Units may use this model to look at specific scenarios and see how they can better improve their processes.

Limitations and Ethics Considerations

As mentioned in the introduction, it is crucial to assess and acknowledge the limitations associated with the dataset and address the ethical implications that arise from analyses of crimes. This section aims to provide a comprehensive understanding of the constraints and ethical considerations in our analysis.

  1. Inherent Bias and Racial Profiling: This dataset, comprising all reported crimes published by the Chicago Police Department, is susceptible to inherent bias. A notable concern is the potential impact of racial profiling and racism in law enforcement practices, which may result in an overestimation of reported crimes for people of color and in neighborhoods with a majority of people of color. This bias can skew the interpretation of crime rates and contribute to an inaccurate portrayal of crime distribution among different racial groups and neighborhoods.

  2. Socioeconomic and Geographic Biases: People residing in lower income or gang related areas may face a higher likelihood of arrest, introducing socioeconomic and geographic biases into the dataset. This bias could potentially lead to an overrepresentation of reported crimes in specific neighborhoods, influencing the overall crime statistics. This can also exacerbate the overestimation for people of color, as the long-lasting effects of red-lining, gentrification, and generational wealth mean that people of color may be overrepresented in lower income areas, compounding the effects of both racism and socioeconomic biases.

  3. Reporting Discrepancies: It is also incredibly important to recognize that the dataset represents reported crimes, not the true number of crimes. Crimes that are less frequently reported, such as domestic assault and sexual assault, may be inadequately represented in this dataset. This reporting discrepancy poses a challenge in accurately assessing the prevalence of certain types of crimes and limits the generalizability of our results to only those crimes that are represented in our data.

  4. Generalizability and Limitations of Conclusions: Given the biases and reporting discrepancies, we must be cautious in generalizing our findings. Limitations stemming from biased reporting and potential underrepresentation of specific crime types means there needs to be careful consideration in generalizing conclusions beyond the dataset. The validity of our conclusions is contingent upon the awareness of the limitations stated above.

  5. Ethical Sensitivity: The data analyzed contains sensitive information, and we recognize the ethical responsibility associated with handling such data. Special care has been taken to ensure that our analysis is conducted with the utmost integrity and respect for privacy. This includes a commitment to preventing any misuse of our results that could adversely impact certain demographic groups.

  6. Implications for Policy and Practice: Acknowledging the limitations and biases in the dataset, it is imperative to approach any policy or practice recommendations with caution. Recommendations should be guided by an understanding of the potential biases and limitations, ensuring that they do not inadvertently harm specific groups or perpetuate existing disparities.

In conclusion, our exploration of this data requires a nuanced awareness of its limitations and ethical considerations. By addressing these issues, we aim to contribute responsibly to the broader discourse on crime while recognizing the complexity of the social and ethical landscapes in which our analysis is situated.

Bibliography

  1. Ashby, Matthew P. J. 2020. “Initial Evidence on the Relationship between the Coronavirus Pandemic and Crime in the United States.” Crime Science 9 (1). https://doi.org/10.1186/s40163-020-00117-6.

  2. Ba, Bocar A., Dean Knox, Jonathan Mummolo, and Roman Rivera. 2021. “The Role of Officer Race and Gender in Police-Civilian Interactions in Chicago.” Science 371 (6530): 696–702. https://doi.org/10.1126/science.abd8694.

  3. Block, Carolyn R., and Richard Block. 1993. Street Gang Crime in Chicago. Google Books. U.S. Department of Justice, Office of Justice Programs, National Institute of Justice. https://books.google.com/books?hl=en&lr=&id=cozaAAAAMAAJ&oi=fnd&pg=PA6&dq=chicago+crime&ots=qNTfmVtjVa&sig=3KBxL5jazUY-ZBubpi1DeqlKq20#v=onepage&q=chicago%20crime&f=false.

  4. Campedelli, Gian Maria, Serena Favarin, Alberto Aziani, and Alex R. Piquero. 2020. “Disentangling Community-Level Changes in Crime Trends during the COVID-19 Pandemic in Chicago.” Crime Science 9 (1). https://doi.org/10.1186/s40163-020-00131-8.

  5. Chicago Police Department. 2011. “Crimes - 2001 to Present.” Cityofchicago.org. September 30, 2011. https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2.

  6. dc37. n.d. “Plot with Conditional Colors Based on Values in R.” Stack Overflow. https://stackoverflow.com/questions/11838278/plot-with-conditional-colors-based-on-values-in-r.

  7. finnstats. 2021. “Animated Graph GIF with Gganimate & Ggplot | R-Bloggers.” R Bloggers. May 15, 2021. https://www.r-bloggers.com/2021/05/animated-graph-gif-with-gganimate-ggplot/.

  8. “Homicide Map | City of Chicago | Data Portal.” n.d. Chicago. https://data.cityofchicago.org/Public-Safety/Homicide-Map/53tx-phyr.

  9. Huynh, Y. Wendy. n.d. 6.3 Group_by() and Ungroup() | R for Graduate Students. Bookdown.org. https://bookdown.org/yih_huynh/Guide-to-R-Book/groupby.html.

  10. Jenness, Valerie, and Ryken Grattet. 1996. “The Criminalization of Hate: A Comparison of Structural and Polity Influences on the Passage of ‘Bias-Crime’ Legislation in the United States.” Sociological Perspectives 39 (1): 129–54. https://doi.org/10.2307/1389346.

  11. Lemmer, Thomas J., Gad J. Bensinger, and Arthur J. Lurigio. 2008. “An Analysis of Police Responses to Gangs in Chicago.” Police Practice and Research 9 (5): 417–30. https://doi.org/10.1080/15614260801980836.

  12. “Return the Index of the First Maximum Value of a Numeric Vector in R Programming - Which.max() Function.” 2020. GeeksforGeeks. June 6, 2020. https://www.geeksforgeeks.org/return-the-index-of-the-first-maximum-value-of-a-numeric-vector-in-r-programming-which-max-function/.

  13. RHertel. n.d. “R - Extract Year from Date.” Stack Overflow. https://stackoverflow.com/questions/36568070/extract-year-from-date.

  14. Schreck, Christopher J., Jean Marie McGloin, and David S. Kirk. 2009. “On the Origins of the Violent Neighborhood: A Study of the Nature and Predictors of Crime‐Type Differentiation across Chicago Neighborhoods.” Justice Quarterly 26 (4): 771–94. https://doi.org/10.1080/07418820902763079.

  15. Smith, Michael R., and Matthew Petrocelli. 2001. “Racial Profiling? A Multivariate Analysis of Police Traffic Stop Data.” Police Quarterly 4 (1): 4–27. https://doi.org/10.1177/1098611101004001001.

  16. Spring, Jon. n.d. “How to Make Dots in Gganimate Appear and Not Transition.” Stack Overflow. Accessed November 18, 2023. https://stackoverflow.com/questions/56411604/how-to-make-dots-in-gganimate-appear-and-not-transition.

  17. Thomas. n.d. “Error in Adding Rows to an Empty Data Frame in R.” Stack Overflow. Accessed November 18, 2023. https://stackoverflow.com/questions/25051528/error-in-adding-rows-to-an-empty-data-frame-in-r.

  18. “Violent Crime Time of Day(per 1,000 in Age Group).” 2022. Www.ojjdp.gov. 2022. https://www.ojjdp.gov/ojstatbb/offenders/qa03401.asp#:~:text=In%20general%2C%20the%20number%20of.